Skip to content

E2e reliability and performance#2201

Open
samuelreichert wants to merge 20 commits into
mainfrom
e2e-reliability-and-performance
Open

E2e reliability and performance#2201
samuelreichert wants to merge 20 commits into
mainfrom
e2e-reliability-and-performance

Conversation

@samuelreichert
Copy link
Copy Markdown
Contributor

@samuelreichert samuelreichert commented May 5, 2026

Pull request type

No code changes (changes to documentation, CI, metadata, etc.)
Test related change (New E2E test, test automation, etc.)


Description

This PR improves E2E test reliability and reduces nightly run duration through a set of infrastructure, stability, and parallelization improvements.

Shared fixtures & Mendix helpers (run-e2e)

  • Added worker-scoped session fixture — each Playwright worker holds one Mendix session, eliminating per-test login/logout overhead and staying within the 5-session developer license limit
  • Added waitForMendixApp(page) helper to replace waitForLoadState("networkidle"), which was unreliable and caused flaky tests
  • Added checkAccessibility shared helper
  • Added smoke suite support via E2E_SUITE=smoke env var and @smoke tag

Flakiness fixes

  • Replaced all page.waitForTimeout() fixed delays with event-based waits
  • Removed waitForLoadState("networkidle") across all specs
  • Fixed race conditions in datagrid filter specs
  • Fixed racy nth(1) assertion in badge-button close page test
  • Migrated remaining specs that the codemod missed

Screenshot baseline hardening

  • Set global threshold: 0.1 and animations: "disabled" in playwright config
  • Removed per-test loose threshold/maxDiffPixels overrides (0.2–0.5, up to 15000px) that were masking real pixel differences

ESLint rules

  • Added eslint-plugin-playwright rules to prevent flaky patterns from creeping back: no-wait-for-timeout (error), no-networkidle (warn), prefer-web-first-assertions (warn)

Parallelization

  • Nightly workflow now runs 4 parallel matrix runners instead of a single sequential job
  • Tests split using weighted bin-packing (FFD algorithm) for balanced runner load
  • Expected wall-clock time: ~80 min → ~20–25 min

Turbo

  • Fixed stale Cypress input globs (cypress/**, cypress.config.*) replaced with Playwright equivalents (e2e/**, playwright.config.*) for correct cache invalidation

What should be covered while testing?

  • Trigger the nightly workflow manually and verify all 4 matrix runners complete without infrastructure errors
  • Linux screenshot baselines for column-chart-web, heatmap-chart-web, datagrid-web (virtual scrolling), and tree-node-web need regeneration

@samuelreichert samuelreichert force-pushed the e2e-reliability-and-performance branch 2 times, most recently from a75f91f to 8d2f812 Compare May 18, 2026 08:55
@samuelreichert samuelreichert marked this pull request as ready for review May 18, 2026 08:56
samuelreichert and others added 20 commits May 18, 2026 10:56
Introduces reusable test infrastructure:
- fixtures.mjs: custom Playwright test with auto Mendix readiness wait
  and guarded session cleanup (replaces 54 manual afterEach blocks)
- mendix-helpers.mjs: waitForMendixApp, waitForWidget, waitForListData,
  safeLogout, navigateToPage utilities

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…defaults

- Add actionTimeout (10s) and navigationTimeout (30s) to catch hangs
- Add global screenshot defaults: animations disabled, threshold 0.1
- Replaces per-test threshold overrides with a sensible global default

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace cypress/**,cypress.config.* with e2e/**,playwright.config.*
so turbo correctly tracks E2E test file changes for cache invalidation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add eslint-plugin-playwright rules for E2E spec files:
- no-wait-for-timeout (error): blocks new hardcoded delays
- no-networkidle (warn): flags unreliable networkidle usage
- prefer-web-first-assertions (warn): encourages auto-retrying assertions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
migrate-spec.mjs automates three transforms per spec file:
1. Replace @playwright/test import with shared fixtures
2. Remove afterEach logout blocks (fixture handles cleanup)
3. Replace waitForLoadState("networkidle") with waitForMendixApp

Supports --dry-run flag for preview. Run per file or batch via find.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- column-chart: wait for .plot-container visibility instead of 1000ms
- datagrid-dropdown-filter: assert row count instead of 300ms delay
- gallery: assert item count after filter instead of 1000ms delay
- heatmap-chart: remove 500ms delay (colorbar visibility already asserted)
- skiplink: use toHaveCSS assertion instead of 1000ms + manual evaluate

Eliminates all 12 hardcoded timeout calls from E2E specs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run codemod across 55 E2E spec files to:
- Replace @playwright/test import with @mendix/run-e2e/fixtures
- Remove manual afterEach session logout (fixture handles cleanup)
- Replace waitForLoadState("networkidle") with waitForMendixApp()

The shared fixture provides automatic Mendix readiness detection and
guarded session teardown, eliminating 54 manual logout blocks and
154 unreliable networkidle waits.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rides

Remove all per-test threshold (0.1-0.5) and maxDiffPixels (4000-15000)
overrides. Global config now sets threshold: 0.1 and animations: disabled.

- maps-web/google: replace meaningless 15000px-diff screenshot with
  structural assertion (canvas/div visibility)
- maps-web/here,mapbox,openstreet: remove maxDiffPixels: 4000
- charts (pie, heatmap, time-series, column): remove threshold: 0.5
- rich-text, slider, timeline: remove threshold: 0.2-0.4

Screenshot baselines will need regeneration on CI after this change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace sequential --workspace-concurrency=1 execution with 4 parallel
runners using run-e2e-in-chunks.mjs bin-packing distribution.

Expected ~75% reduction in nightly E2E wall-clock time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… tag

When E2E_SUITE=smoke is set, only tests tagged with @smoke will run.
This enables fast PR feedback by running a minimal subset of tests.

Usage in test files:
  test("renders widget @smoke", async ({ page }) => { ... });

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds checkAccessibility(page, selector, options) to mendix-helpers.
Wraps @axe-core/playwright with sensible defaults (wcag21aa tags).

Usage:
  import { checkAccessibility } from "@mendix/run-e2e/mendix-helpers";
  await checkAccessibility(page, ".mx-name-widget1");

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Without explicit exports, Node cannot resolve @mendix/run-e2e/fixtures
or @mendix/run-e2e/mendix-helpers from widget spec files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…idle

The previous implementation resolved too early — mx.session exists before
sub-page widgets render. Now waits for:
1. domcontentloaded
2. mx.session + no progress indicator + .mx-page exists
3. networkidle (ensures widget data fetches complete)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…der)

Codemod regex only matched `{ test, expect }` ordering. Six specs used
`{ expect, test }` and were left importing from @playwright/test — no
fixture cleanup meant session exhaustion after ~5 tests.

Also fix codemod regex to handle both orderings for future runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Make mendixSession worker-scoped so each worker holds 1 Mendix session,
  preventing session exhaustion with parallel workers (4 workers < 5 license limit).
- Fix waitForFunction call: pass timeout as 3rd arg (options), not 2nd (arg).
  Previously actionTimeout (10s) was used instead of intended 60s.
- Set local workers to 4 (safe with worker-scoped sessions).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- FilteringSingle: replace allTextContents() after toHaveText() with
  single toHaveText([...array]) call that retries atomically.
- FilteringMulti: remove waitForTimeout(300) and allTextContents() pattern,
  use toHaveText()/toContainText() for auto-retry assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…specs

- video-player-web: remove redundant waitForMendixReady (fixture handles it).
- checkbox-radio-selection-web: replace networkidle waits with waitForMendixApp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…itForDataReady

networkidle hangs forever on pages with streaming content (video embeds,
websockets). The mx.session + .mx-page check is sufficient for readiness.
Add waitForDataReady helper for tests that explicitly need networkidle.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migration to shared fixtures and waitForMendixApp is complete across all specs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@samuelreichert samuelreichert force-pushed the e2e-reliability-and-performance branch from 8d2f812 to cf624f0 Compare May 18, 2026 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment